#install.packages("plotly", repos = "http://cran.us.r-project.org")
library(ggplot2)
library(plotly)
library(data.table)
library(RColorBrewer)
This project is an exploration of the MyAnimeList dataset provided on Kaggle.com.
To start things off, We first import the data into R and observe its’ structure:
anime_data <- fread("dataanime.csv")
str(anime_data)
## Classes 'data.table' and 'data.frame': 1563 obs. of 20 variables:
## $ Title : chr "Fullmetal Alchemist: Brotherhood" "Kimi no Na wa." "Gintama°" "Steins;Gate 0" ...
## $ Type : chr "TV" "Movie" "TV" "TV" ...
## $ Episodes : chr "64" "1" "51" "23" ...
## $ Status : chr "Finished Airing" "Finished Airing" "Finished Airing" "Currently Airing" ...
## $ Start airing : chr "2009-4-5" "2016-8-26" "2015-4-8" "2018-4-12" ...
## $ End airing : chr "2010-7-4" "-" "2016-3-30" "-" ...
## $ Starting season: chr "Spring" "-" "Spring" "Spring" ...
## $ Broadcast time : chr "Sundays at 17:00 (JST)" "-" "Wednesdays at 18:00 (JST)" "Thursdays at 01:35 (JST)" ...
## $ Producers : chr "Aniplex,Square Enix,Mainichi Broadcasting System,Studio Moriken" "Kadokawa Shoten,Toho,Sound Team Don Juan,Lawson HMV Entertainment,Amuse,East Japan Marketing & Communications" "TV Tokyo,Aniplex,Dentsu" "Nitroplus" ...
## $ Licensors : chr "Funimation,Aniplex of America" "Funimation,NYAV Post" "Funimation,Crunchyroll" "Funimation" ...
## $ Studios : chr "Bones" "CoMix Wave Films" "Bandai Namco Pictures" "White Fox" ...
## $ Sources : chr "Manga" "Original" "Manga" "Visual novel" ...
## $ Genres : chr "Action,Military,Adventure,Comedy,Drama,Magic,Fantasy,Shounen" "Supernatural,Drama,Romance,School" "Action,Comedy,Historical,Parody,Samurai,Sci-Fi,Shounen" "Sci-Fi,Thriller" ...
## $ Duration : chr "24 min. per ep." "1 hr. 46 min." "24 min. per ep." "23 min. per ep." ...
## $ Rating : chr "R" "PG-13" "R" "PG-13" ...
## $ Score : num 9.25 9.19 9.16 9.16 9.14 9.11 9.11 9.11 9.1 9.07 ...
## $ Scored by : int 719706 454969 70279 12609 552791 28452 90758 395162 26284 62582 ...
## $ Members : int 1176368 705186 194359 186331 990419 121772 212238 705225 80166 121612 ...
## $ Favorites : int 105387 33936 5597 1117 90365 8370 4533 63324 1961 1498 ...
## $ Description : chr "\"\"In order for something to be obtained, something of equal value must be lost.\"\"\r\n\r\nAlchemy is bound b"| __truncated__ "Mitsuha Miyamizu, a high school girl, yearns to live the life of a boy in the bustling city of Tokyoâ\200”a dre"| __truncated__ "Gintoki, Shinpachi, and Kagura return as the fun-loving but broke members of the Yorozuya team! Living in an al"| __truncated__ "The dark untold story of Steins;Gate that leads with the eccentric mad scientist Okabe, struggling to recover f"| __truncated__ ...
## - attr(*, ".internal.selfref")=<externalptr>
We can see that we have a dataframe with more than 1500 anime. Each of them has information about their Title, Score, Sources, Broadcast time etc.
At first, we change the categorical columns into factors:
Next, we convert the date columns into the proper date format in R:
anime_data$`Start airing` <-
as.Date(anime_data$`Start airing`, "%Y-%m-%d")
anime_data$`End airing` <-
as.Date(anime_data$`End airing`, "%Y-%m-%d")
Now we will see what percentage of our data is missing(NA) in a plot:
In this part we ask a series of important questions about the data and try to answer them with a graphical representation of our data:
I wanted to have a pie chart here but was forced to use a barplot because there was a problem with coord_polar(‘y’) for some reason apparently, it is currently an open issue in the ggplot2 package